Domain Decomposition method on GPU cluster

نویسندگان

Yusuke Osaki

Ken-Ichi Ishikawa

چکیده

Pallalel GPGPU computing for lattice QCD simulations has a bottleneck on the GPU to GPU data communication due to the lack of the direct data exchanging facility. In this work we investigate the performance of quark solver using the restricted additive Schwarz (RAS) preconditioner on a low cost GPU cluster. We expect that the RAS preconditioner with appropriate domaindecomposition and task distribution reduces the communication bottleneck. The GPU cluster we constructed is composed of four PC boxes, two GPU cards are attached to each box, and we have eight GPU cards in total. The compute nodes are connected with rather slow but low cost Gigabit-Ethernet. We include the RAS preconditioner in the single-precision part of the mixedprecision nested-BiCGStab algorithm and the single-precision task is distributed to the multiple GPUs. The benchmarking is done with the O(a)-improved Wilson quark on a randomly generated gauge configuration with the size of 324. We observe a factor two improvment on the solver performance with the RAS precoditioner compared to that without the preconditioner and find that the improvment mainly comes from the reduction of the communication bottleneck as we expected.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A scalable hybrid algorithm based on domain decomposition and algebraic multigrid for solving partial differential equations on a cluster of CPU/GPUs

Several of the top ranked supercomputers are based on the hybrid architecture consisting of a large number of CPUs and GPUs. Very high performance has been obtained for problems with special structures, such as FFT-based image processing or N-body based particle calculations. However, for the class of problems described by partial differential equations discretized by finite difference (or othe...

متن کامل

High-accuracy Optimization by Parallel Iterative Discrete Approximation and GPU Cluster Computing

High-accuracy optimization is the key component of time-sensitive applications in computer sciences such as machine learning, and we develop single-GPU Iterative Discrete Approximation Monte Carlo Optimization (IDAMCS) and multi-GPU IDA-MCS in our previous research. However, because of the memory capability constrain of GPUs in a workstation, single-GPU IDA-MCS and multiGPU IDA-MCS may be in lo...

متن کامل

MPI- and CUDA- implementations of modal finite difference method for P-SV wave propagation modeling

Among different discretization approaches, Finite Difference Method (FDM) is widely used for acoustic and elastic full-wave form modeling. An inevitable deficit of the technique, however, is its sever requirement to computational resources. A promising solution is parallelization, where the problem is broken into several segments, and the calculations are distributed over different processors. ...

متن کامل

Molecular dynamics simulation of the supercooled Al melt on GPUs

The method of molecular dynamics (MD) is widely used to study static and dynamic properties of the condensed matter [1]. In particular an approach to study the relaxation of metastable states is developed [2]. These states play essential role in the impulse loading processes such as shock compression, laser ablation, etc. Herewith we report on simulation technique and results for crystallizatio...

متن کامل

Fast evaluation of Helmholtz potential on graphics processor units ( GPUs )

Non-uniform grid method (NGM) is a fast algorithm to accelerate the integral equation based method of static and dynamic field evaluation in various areas such as electromagnetics, optics, magnetics etc. The NG method reduce the computational complexity of direct evaluation of interaction between N unknowns from 2 ( ) O N to ( ) O N in static and low-frequency regime and ( log ) O N N in the hi...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1011.3318 شماره

صفحات -

تاریخ انتشار 2010

Domain Decomposition method on GPU cluster

نویسندگان

چکیده

منابع مشابه

A scalable hybrid algorithm based on domain decomposition and algebraic multigrid for solving partial differential equations on a cluster of CPU/GPUs

High-accuracy Optimization by Parallel Iterative Discrete Approximation and GPU Cluster Computing

MPI- and CUDA- implementations of modal finite difference method for P-SV wave propagation modeling

Molecular dynamics simulation of the supercooled Al melt on GPUs

Fast evaluation of Helmholtz potential on graphics processor units ( GPUs )

عنوان ژورنال:

اشتراک گذاری